Scalability of Parallel Genetic Algorithm for Two-mode Clustering
نویسندگان
چکیده
Data matrix having the same set of entity in the rows and cloumns is known as one-mode data matrix, and traditional one-mode clustering algorithms can be used to cluster the rows (or columns) separately. With the popularity of use of two-mode data matrices where the rows and columns have different sets of entities, the need for simultaneous clustering of rows and columns popularly known as two-mode clustering increased. Additionally, the emergence of large data sets and the prediction of Moore's law slow-down have created the challenge of clustering scalability. In this paper, we address the problem of scalability of organizing an unlabelled twomode dataset into clusters utilizing multicore processor. We propose a parallel genetic algorithm (GA) heuristics based two-mode clustering algorithm, which is an adaptation of the classical Cuthill-McKee Matrix Bandwidth Minimization (MBM) algorithm. The classical MBM method aims at reducing the bandwidth of a sparse symmetric matrix, which we adapted to make it suitable for non-symmetric real-valued matrix. Preliminary results indicate that our algorithm is scalable on multicore processor compared to serial implementation. Future work will include more extensive experiments and evaluations of the system.
منابع مشابه
خوشهبندی دادهها بر پایه شناسایی کلید
Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...
متن کاملA new multi-mode and multi-product hub covering problem: A priority M/M/c queue approach
One main group of a transportation network is a discrete hub covering problem that seeks to minimize the total transportation cost. This paper presents a multi-product and multi-mode hub covering model, in which the transportation time depends on travelling mode between each pair of hubs. Indeed, the nature of products is considered different and hub capacity constraint is also applied. Due to ...
متن کاملModeling and Solution Procedure for a Preemptive Multi-Objective Multi-Mode Project Scheduling Model in Resource Investment Problems
In this paper, a preemptive multi-objective multi-mode project scheduling model for resource investment problem is proposed. The first objective function is to minimize the completion time of project (makespan);the second objective function is to minimize the cost of using renewable resources. Non-renewable resources are also considered as parameters in this model. The preemption of activities ...
متن کاملStatic Task Allocation in Distributed Systems Using Parallel Genetic Algorithm
Over the past two decades, PC speeds have increased from a few instructions per second to several million instructions per second. The tremendous speed of today's networks as well as the increasing need for high-performance systems has made researchers interested in parallel and distributed computing. The rapid growth of distributed systems has led to a variety of problems. Task allocation is a...
متن کاملHigh Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014